Efficient biased estimation of evolutionary distances when substitution rates vary across sites.
نویسندگان
چکیده
This paper deals with phylogenetic inference when the variability of substitution rates across sites (VRAS) is modeled by a gamma distribution. We show that underestimating VRAS, which results in underestimates for the evolutionary distances between sequences, usually improves the topological accuracy of phylogenetic tree inference by distance-based methods, especially when the molecular clock holds. We propose a method to estimate the gamma shape parameter value which is most suited for tree topology inference, given the sequences at hand. This method is based on the pairwise evolutionary distances between sequences and allows one to reconstruct the phylogeny of a high number of taxa (>1,000). Simulation results show that the topological accuracy is highly improved when using the gamma shape parameter value given by our method, compared with the true (unknown) value which was used to generate the data. Furthermore, when VRAS is high, the topological accuracy of our distance-based method is better than that of a maximum likelihood approach. Finally, a data set of Maoricicada species sequences is analyzed, which confirms the advantage of our method.
منابع مشابه
Patterns of nucleotide substitution in mitochondrial protein coding genes of vertebrates.
Maximum likelihood methods were used to study the differences in substitution rates among the four nucleotides and among different nucleotide sites in mitochondrial protein-coding genes of vertebrates. In the 1st + 2nd codon position data, the frequency of nucleotide G is negatively correlated with evolutionary rates of genes, substitution rates vary substantially among sites, and the transitio...
متن کاملPosition Dependent and Independent Evolutionary Models Based on Empirical Amino Acid Substitution Matrices
Evolutionary models measure the probability of amino acid substitutions occurring over different evolutionary distances. We examine various evolutionary models based on empirically derived amino acid substitution matrices. The models are constructed using the PAM and BLOSUM amino acid substitution matrices. We rescale these matrices by raising them to powers to model substitution patterns that ...
متن کاملMaximum likelihood estimation of phylogenetic trees is consistent when substitution rates vary according to the invariable sites plus gamma distribution.
Maximum likelihood estimation of phylogenetic trees from nucleotide sequences is completely consistent when nucleotide substitution is governed by the general time reversible (GTR) model with rates that vary over sites according to the invariable sites plus gamma (I + gamma) distribution.
متن کاملWhen is it safe to use an oversimplified substitution model in tree-making?
The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p-distance) between sequences as an estimator of evolutionary dista...
متن کاملEstimation of evolutionary distances under stationary and nonstationary models of nucleotide substitution.
Estimation of evolutionary distances has always been a major issue in the study of molecular evolution because evolutionary distances are required for estimating the rate of evolution in a gene, the divergence dates between genes or organisms, and the relationships among genes or organisms. Other closely related issues are the estimation of the pattern of nucleotide substitution, the estimation...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Molecular biology and evolution
دوره 19 4 شماره
صفحات -
تاریخ انتشار 2002